Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Tracking MLJ experiment using MLFlowClient.jl

In this case, the max_depth hyperparameter is being tuned. MLFlow will track the entire parameter as an array, and receive the accuracy for each model.

md"""
# Tracking MLJ experiment using MLFlowClient.jl
In this case, the `max_depth` hyperparameter is being tuned. MLFlow will track the entire parameter as an array, and receive the accuracy for each model.
"""
87.7 μs
begin
using Pkg
Pkg.activate("./")

using Random
Random.seed!(9)
end
❔
  Activating project at `~/git/ds_portfolio/notebooks/mlflow_with_mlj`
102 ms
using MLJ, DataFrames
1.8 s

Data ingestion

md"### Data ingestion"
68.6 μs
iris
sepal_lengthsepal_widthpetal_lengthpetal_widthtarget
Float64Float64Float64Float64CategoricalValue
1
5.1
3.5
1.4
0.2
"setosa"
2
4.9
3.0
1.4
0.2
"setosa"
3
4.7
3.2
1.3
0.2
"setosa"
4
4.6
3.1
1.5
0.2
"setosa"
5
5.0
3.6
1.4
0.2
"setosa"
6
5.4
3.9
1.7
0.4
"setosa"
7
4.6
3.4
1.4
0.3
"setosa"
8
5.0
3.4
1.5
0.2
"setosa"
9
4.4
2.9
1.4
0.2
"setosa"
10
4.9
3.1
1.5
0.1
"setosa"
more
150
5.9
3.0
5.1
1.8
"virginica"
iris = load_iris() |> DataFrames.DataFrame
1.6 s
┌──────────────┬───────────────┬──────────────────────────────────┐
│ names        │ scitypes      │ types                            │
├──────────────┼───────────────┼──────────────────────────────────┤
│ sepal_length │ Continuous    │ Float64                          │
│ sepal_width  │ Continuous    │ Float64                          │
│ petal_length │ Continuous    │ Float64                          │
│ petal_width  │ Continuous    │ Float64                          │
│ target       │ Multiclass{3} │ CategoricalValue{String, UInt32} │
└──────────────┴───────────────┴──────────────────────────────────┘
schema(iris)
16.1 ms
train, test = partition(iris, 0.8, shuffle=true)
79.6 ms
train_y, train_X = unpack(train, ==(:target))
41.1 ms
test_y, test_X = unpack(test, ==(:target))
85.7 μs

MLFlowClient setup

md"""
### MLFlowClient setup
"""
64.0 μs
using MLFlowClient
13.6 ms
mlf
MLFlowClient.MLFlow(
    baseuri = "http://localhost:5000", 
    apiversion = 2.0
)
mlf = MLFlow("http://localhost:5000")
7.1 μs
MLFlowClient.MLFlowExperiment(
    name = "iris_classification", 
    lifecycle_stage = "active", 
    experiment_id = 645006071875603648, 
    tags = missing, 
    artifact_location = "/home/pebeto/git/ds_portfolio/notebooks/mlflow_with_mlj/iris-artifacts"
)
if ismissing(getexperiment(mlf, "iris_classification"))
experiment_id = createexperiment(mlf; name="iris_classification", artifact_location="./iris-artifacts")
else
experiment_id = getexperiment(mlf, "iris_classification")
end
3.7 s

Modeling

md"""
### Modeling
"""
68.6 μs
DecisionTreeClassifier
MLJDecisionTreeInterface.DecisionTreeClassifier
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
For silent loading, specify `verbosity=0`. 
❔
import MLJDecisionTreeInterface ✔
255 ms
dtc
DecisionTreeClassifier(
  max_depth = -1, 
  min_samples_leaf = 1, 
  min_samples_split = 2, 
  min_purity_increase = 0.0, 
  n_subfeatures = 0, 
  post_prune = false, 
  merge_purity_threshold = 1.0, 
  display_depth = 5, 
  feature_importance = :impurity, 
  rng = Random._GLOBAL_RNG())
6.5 μs
max_depth_range = range(dtc, :max_depth, lower=2, upper=10, scale=:linear);
11.7 ms
model
ProbabilisticTunedModel(
  model = DecisionTreeClassifier(
        max_depth = -1, 
        min_samples_leaf = 1, 
        min_samples_split = 2, 
        min_purity_increase = 0.0, 
        n_subfeatures = 0, 
        post_prune = false, 
        merge_purity_threshold = 1.0, 
        display_depth = 5, 
        feature_importance = :impurity, 
        rng = Random._GLOBAL_RNG()), 
  tuning = Grid(
        goal = nothing, 
        resolution = 10, 
        shuffle = true, 
        rng = Random._GLOBAL_RNG()), 
  resampling = CV(
        nfolds = 6, 
        shuffle = false, 
        rng = Random._GLOBAL_RNG()), 
  measure = MLJBase.Measure[Accuracy(), LogLoss(tol = 2.220446049250313e-16), MisclassificationRate(), BrierScore()], 
  weights = nothing, 
  class_weights = nothing, 
  operation = nothing, 
  range = NumericRange(2 ≤ max_depth ≤ 10; origin=6.0, unit=4.0), 
  selection_heuristic = MLJTuning.NaiveSelection(nothing), 
  train_best = true, 
  repeats = 1, 
  n = nothing, 
  acceleration = CPU1{Nothing}(nothing), 
  acceleration_resampling = CPU1{Nothing}(nothing), 
  check_measure = true, 
  cache = true)
model = TunedModel(
model=dtc,
resampling=CV(),
tuning=Grid(),
measure=[accuracy, log_loss, misclassification_rate, brier_score]
)
17.7 ms
mach
untrained Machine; does not cache data
  model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
  args: 
    1:	Source @344 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
    2:	Source @690 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
mach = machine(model, train_X, train_y)
156 ms
trained Machine; does not cache data
  model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
  args: 
    1:	Source @344 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
    2:	Source @690 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
fit!(mach)
Training machine(ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …).
Attempting to evaluate 9 models.
❔
Evaluating over 9 metamodels:   0%[>                        ]  ETA: N/A
Evaluating over 9 metamodels:  11%[==>                      ]  ETA: 0:00:48
Evaluating over 9 metamodels:  22%[=====>                   ]  ETA: 0:00:21
Evaluating over 9 metamodels:  33%[========>                ]  ETA: 0:00:12
Evaluating over 9 metamodels:  44%[===========>             ]  ETA: 0:00:08
Evaluating over 9 metamodels:  56%[=============>           ]  ETA: 0:00:05
Evaluating over 9 metamodels:  67%[================>        ]  ETA: 0:00:03
Evaluating over 9 metamodels:  78%[===================>     ]  ETA: 0:00:02
Evaluating over 9 metamodels:  89%[======================>  ]  ETA: 0:00:01
Evaluating over 9 metamodels: 100%[=========================] Time: 0:00:06
8.8 s

Evaluating

md"### Evaluating"
63.1 μs
model_values
model_values = report(mach).history .|> (x -> (x.measure, x.measurement, x.model.max_depth))
42.5 ms
for (measure, measurements, max_depth) in model_values
exprun = createrun(mlf, experiment_id)
logparam(mlf, exprun, "max_depth", max_depth)
measures_names = [x.name for x in measure .|> info]
for (name, val) in zip(measures_names, measurements)
logmetric(mlf, exprun, "$(name)", val)
end
end
93.9 ms